Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
نویسندگان
چکیده
Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes Θ(dn2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dnt) time for high dimensional data (large d). The exponent t ∈ (1,2) is an increasing function of an internal parameter α which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of t. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs.
منابع مشابه
EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...
متن کاملFast kNN Graph Construction with Locality Sensitive Hashing
The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graphbased learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient a...
متن کاملA Parallel Implementation of Multilevel Recursive Spectral Besection for Application to Adaptive Unstructured Meshes
The design of a parallel implementation of multilevel recursive spectral bisection is described. The goal is to implement a code _ha_ is fast enough to enable dynamic repartitioning of adaptive meshes. 1 Background The R.ecursive Spectral Bisection (I_SB) _lgori_hm is one of a class of recursive bisection methods for partitioning unstructured problems [4]. RSB is typically used as a preprocessi...
متن کاملRelaxed Implementation of Spectral Methods for Graph Partitioning
This paper presents a fast implementation of the recursive spectral bisection method for p-way partitioning. It is known that re-cursive bisections for p-way partitioning using optimal strategies at each step may not lead to a good overall solution. The relaxed implementation accelerates the partitioning process by relaxing the accuracy requirement of spectral bisection (SB) method. Considering...
متن کاملDistributed computation of the knn graph for large high-dimensional point sets
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 10 شماره
صفحات -
تاریخ انتشار 2009